Goto

Collaborating Authors

 queue length






Finite-Time Minimax Bounds and an Optimal Lyapunov Policy in Queueing Control

Liu, Yujie, Tan, Vincent Y. F., Xu, Yunbei

arXiv.org Artificial Intelligence

We introduce an original minimax framework for finite-time performance analysis in queueing control and propose a surprisingly simple Lyapunov-based scheduling policy with superior finite-time performance. The framework quantitatively characterizes how the expected total queue length scales with key system parameters, including the capacity of the scheduling set and the variability of arrivals and departures across queues. To our knowledge, this provides the first firm foundation for evaluating and comparing scheduling policies in the finite-time regime, including nonstationary settings, and shows that the proposed policy can provably and empirically outperform classical MaxWeight in finite time. Within this framework, we establish three main sets of results. First, we derive minimax lower bounds on the expected total queue length for parallel-queue scheduling via a novel Brownian coupling argument. Second, we propose a new policy, LyapOpt, which minimizes the full quadratic Lyapunov drift-capturing both first- and second-order terms-and achieves optimal finite-time performance in heavy traffic while retaining classical stability guarantees. Third, we identify a key limitation of the classical MaxWeight policy, which optimizes only the first-order drift: its finite-time performance depends suboptimally on system parameters, leading to substantially larger backlogs in explicitly characterized settings. Together, these results delineate the scope and limitations of classical drift-based scheduling and motivate new queueing-control methods with rigorous finite-time guarantees.



Z-Merge: Multi-Agent Reinforcement Learning for On-Ramp Merging with Zone-Specific V2X Traffic Information

Ibork, Yassine, Won, Myounggyu, Das, Lokesh

arXiv.org Artificial Intelligence

Ramp merging is a critical and challenging task for autonomous vehicles (A Vs), particularly in mixed traffic environments with human-driven vehicles (HVs). Existing approaches typically rely on either lane-changing or inter-vehicle gap creation strategies based solely on local or neighboring information, often leading to sub-optimal performance in terms of safety and traffic efficiency. In this paper, we present a V2X (vehicle-to-everything communication)-assisted Multiagent Reinforcement Learning (MARL) framework for on-ramp merging that effectively coordinates the complex interplay between lane-changing and inter-vehicle gap adaptation strategies by utilizing zone-specific global information available from a roadside unit (RSU). The merging control problem is formulated as a Multiagent Partially Observable Markov Decision Process (MA-POMDP), where agents leverage both local and global observations through V2X communication. To support both discrete and continuous control decisions, we design a hybrid action space and adopt a parameterized deep Q-learning approach. Extensive simulations, integrating the SUMO traffic simulator and the MOSAIC V2X simulator, demonstrate that our framework significantly improves merging success rate, traffic efficiency, and road safety across diverse traffic scenarios.


Knowledge vs. Experience: Asymptotic Limits of Impatience in Edge Tenants

Kiggundu, Anthony, Han, Bin, Schotten, Hans D.

arXiv.org Machine Learning

We study how two information feeds, a closed-form Markov estimator of residual sojourn and an online trained actor-critic, affect reneging and jockeying in a dual M/M/1 system. Analytically, for unequal service rates and total-time patience, we show that total wait grows linearly so abandonment is inevitable and the probability of a successful jockey vanishes as the backlog approaches towards infinity. Furthermore, under a mild sub-linear error condition both information models yield the same asymptotic limits (robustness). We empirically validate these limits and quantify finite backlog differences. Our findings show that learned and analytic feeds produce different delays, reneging rates and transient jockeying behavior at practical sizes, but converge to the same asymptotic outcome implied by our theory. The results characterize when value-of-information matters (finite regimes) and when it does not (asymptotics), informing lightweight telemetry and decision-logic design for low-cost, jockeying-aware systems.


Convergence of Multiagent Learning Systems for Traffic control

Sen, Sayambhu, Bhatnagar, Shalabh

arXiv.org Artificial Intelligence

Rapid urbanization in cities like Bangalore has led to severe traffic congestion, making efficient Traffic Signal Control (TSC) essential. Multi-Agent Reinforcement Learning (MARL), often modeling each traffic signal as an independent agent using Q-learning, has emerged as a promising strategy to reduce average commuter delays. While prior work Prashant L A et. al has empirically demonstrated the effectiveness of this approach, a rigorous theoretical analysis of its stability and convergence properties in the context of traffic control has not been explored. This paper bridges that gap by focusing squarely on the theoretical basis of this multi-agent algorithm. We investigate the convergence problem inherent in using independent learners for the cooperative TSC task. Utilizing stochastic approximation methods, we formally analyze the learning dynamics. The primary contribution of this work is the proof that the specific multi-agent reinforcement learning algorithm for traffic control is proven to converge under the given conditions extending it from single agent convergence proofs for asynchronous value iteration.


A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control

Guo, Qing, Li, Xinhang, Chen, Junyu, Guo, Zheng, Li, Xiaocong, Zhang, Lin, Li, Lei

arXiv.org Artificial Intelligence

Leveraging large language models (LLMs) in traffic signal control (TSC) improves optimization efficiency and interpretability compared to traditional reinforcement learning (RL) methods. However, existing LLM-based approaches are limited by fixed time signal durations and are prone to hallucination errors, while RL methods lack robustness in signal timing decisions and suffer from poor generalization. To address these challenges, this paper proposes HeraldLight, a dual LLMs architecture enhanced by Herald guided prompts. The Herald Module extracts contextual information and forecasts queue lengths for each traffic phase based on real-time conditions. The first LLM, LLM-Agent, uses these forecasts to make fine grained traffic signal control, while the second LLM, LLM-Critic, refines LLM-Agent's outputs, correcting errors and hallucinations. These refined outputs are used for score-based fine-tuning to improve accuracy and robustness. Simulation experiments using CityFlow on real world datasets covering 224 intersections in Jinan (12), Hangzhou (16), and New York (196) demonstrate that HeraldLight outperforms state of the art baselines, achieving a 20.03% reduction in average travel time across all scenarios and a 10.74% reduction in average queue length on the Jinan and Hangzhou scenarios. The source code is available on GitHub: https://github.com/BUPT-ANTlab/HeraldLight.